Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tm: Cleanup include lookup #1991

Merged
merged 3 commits into from Nov 13, 2018
Merged

tm: Cleanup include lookup #1991

merged 3 commits into from Nov 13, 2018

Conversation

b4n
Copy link
Member

@b4n b4n commented Nov 10, 2018

Don't use the files inode as the hash. Although it looks like a good idea for de-duplicating links as well, it has several issues, including non-uniqueness of inodes across file systems.
The way it was done hashing the inode but comparing the file name string pointers also made the hash mostly irrelevant, as it just stored filenames sharing the same inode in the same hash bucket but without
actually doing any de-duplication, making the whole thing a convoluted way of converting to a list.

Instead, hash and compare the filenames themselves, which, even though it doesn't handle links de-duplication, is better than the non-functional previous code.

Also, directly build the list and only use the hash table as a way for checking for duplicates, which is both faster and gives a stable output.

See #1989

@bmwiedemann
Copy link

The whole tm_file_inode_hash function can also be dropped, because it is unused

Don't use the files inode as the hash.  Although it looks like a good
idea for de-duplicating links as well, it has several issues, including
non-uniqueness of inodes across file systems.
The way it was done hashing the inode but comparing the file name
string pointers also made the hash mostly irrelevant, as it just stored
filenames sharing the same inode in the same hash bucket but without
actually doing any de-duplication, making the whole thing a convoluted
way of converting to a list.

Instead, hash and compare the filenames themselves, which, even though
it doesn't handle links de-duplication, is better than the
non-functional previous code.

Also, directly build the list and only use the hash table as a way for
checking for duplicates, which is both faster and gives a stable
output.
@b4n
Copy link
Member Author

b4n commented Nov 12, 2018

Oops, that's what you get for making changes and committing in a hurry.
Fixed, and this time I built it before, and ran our tests.

@bmwiedemann
Copy link

There is 6-9 lines of code duplication around the g_list_prepend calls - could be improved in a later PR.

@b4n
Copy link
Member Author

b4n commented Nov 12, 2018

@bmwiedemann what do you mean, between the glob and non-glob versions? I don't really mind given how the glob conditional is not trivial.

@b4n
Copy link
Member Author

b4n commented Nov 12, 2018

I just added an extra 2 things here:

  1. process files in the order they appear on the command line. Before, the order was stable for a given command line, but was reversed, which IMO is clearly not the expected behavior. This said, basically nobody should care.
  2. I added a test case that verifies the processing order.

@elextr
Copy link
Member

elextr commented Nov 12, 2018

LGBI

Is the runner.sh change really part of this, or is it a general change that possibly should be separate?

@b4n
Copy link
Member Author

b4n commented Nov 12, 2018

Is the runner.sh change really part of this, or is it a general change that possibly should be separate?

Well, it's not really specific to this, but it's needed for this test case because it requires parsing more than one input file at once, which the current automated setup doesn't allow. So yeah, it can be used by more test cases in theory, but in practice until now we didn't have a use case.

@elextr
Copy link
Member

elextr commented Nov 12, 2018

but it's needed for this test case because it requires parsing more than one input file at once

Thats fine then.

@b4n b4n merged commit 8b68c5a into geany:master Nov 13, 2018
b4n added a commit that referenced this pull request Nov 13, 2018
Process files in the order they appear on the command line when
generating tags file, instead of a more or less random order.

Closes #1989.
@b4n b4n added this to the 1.34 milestone Nov 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants